Learning Distance Functions: Algorithms and Applications
نویسندگان
چکیده
This thesis presents research in the field of distance learning. Distance functions are extensively used in various application domains and also serve as an important building block in many types of algorithms. Despite their abundance, until recently only canonical distance functions such as the Euclidean distance have been used, or alternatively various application specific distance functions have been suggested, which in most cases were handdesigned to incorporate domain specific knowledge. In the last several years there has been a growing body of work on algorithms for learning distance functions. A considerable amount of different distance learning algorithms have been suggested, most of which aim at learning a restricted form of distance functions called Mahalanobis metrics. In this thesis I will present three novel distance learning algorithms: 1. Relevant Component Analysis (RCA) An algorithm for learning a Mahalanobis metric using positive equivalence constraints. 2. DistBoost A boosting based algorithm which can learn highly non-linear distance functions using equivalence constraints. 3. KernelBoost A variant of the DistBoost algorithm which learns Kernel functions, which can be used in any kernel-based classifier. I will then describe their applications to various data domains, which include clustering, image-retrieval, computational immunology, auditory data analysis and kernel-based classification. In all of these application domains, significant improvement is made when using a learned distance function instead of a standard off-the-shelf distance function. These results demonstrate the importance of this growing research field. The first two chapters of this work present a general introduction to the field of distance functions, and distance function learning, with some additional background on semi-supervised learning: Chapter 1 Introduction: In Chapter 1 we provide a general introduction to distance functions, and some reasons why the distance learning problem is an important and interesting learning scenario. We then provide a detailed overview of canonical and hand-designed distance functions. The algorithms presented in this thesis are all from the field of semi-supervised learning. We therefore present a short introduction to the field of semi-supervised learning, with a specific focus on learning using equivalence constraints, which is
منابع مشابه
یادگیری نیمه نظارتی کرنل مرکب با استفاده از تکنیکهای یادگیری معیار فاصله
Distance metric has a key role in many machine learning and computer vision algorithms so that choosing an appropriate distance metric has a direct effect on the performance of such algorithms. Recently, distance metric learning using labeled data or other available supervisory information has become a very active research area in machine learning applications. Studies in this area have shown t...
متن کاملSample size determination for logistic regression
The problem of sample size estimation is important in medical applications, especially in cases of expensive measurements of immune biomarkers. This paper describes the problem of logistic regression analysis with the sample size determination algorithms, namely the methods of univariate statistics, logistics regression, cross-validation and Bayesian inference. The authors, treating the regr...
متن کاملEMCSO: An Elitist Multi-Objective Cat Swarm Optimization
This paper introduces a novel multi-objective evolutionary algorithm based on cat swarm optimizationalgorithm (EMCSO) and its application to solve a multi-objective knapsack problem. The multi-objective optimizers try to find the closest solutions to true Pareto front (POF) where it will be achieved by finding the less-crowded non-dominated solutions. The proposed method applies cat swarm optim...
متن کاملAssessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories
In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...
متن کاملAn Effective Approach for Robust Metric Learning in the Presence of Label Noise
Many algorithms in machine learning, pattern recognition, and data mining are based on a similarity/distance measure. For example, the kNN classifier and clustering algorithms such as k-means require a similarity/distance function. Also, in Content-Based Information Retrieval (CBIR) systems, we need to rank the retrieved objects based on the similarity to the query. As generic measures such as ...
متن کامل